AITopics | arm pull

Collaborating Authors

arm pull

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d3fad7d3634dbfb61018813546edbccb-Paper.pdf

Neural Information Processing SystemsNov-19-2025, 01:32:03 GMT

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
North America > Canada (0.04)

Genre: Personal > Interview (0.48)

Industry: Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

AT Preliminaries

Neural Information Processing SystemsNov-16-2025, 09:56:42 GMT

We now present some technical results that will be repeatedly used in the rest of the paper. A direct corollary of the Chernoff-Hoeffding bound (see, e.g. We also use the following variation of Chernoff bound for sampling without replacement. We provide the lower bound proofs for the results in Section 4. We remark that these lower bounds In particular, lower bounds for offline multi-armed bandits are often information-theoretic and does not depend on adversarial instances. By Y ao's minimax principle, it suffices to prove the lower bound for deterministic algorithms over (n 1) ( n 1) ( n 1) We remark that even with the random arrival of arms, the sample lower bound in Theorem 1 still holds.

arm pull, artificial intelligence, probability, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback

Single-pass Streaming Lower Bounds for Multi-armed Bandits Exploration with Instance-sensitive Sample Complexity

Neural Information Processing SystemsNov-16-2025, 09:56:38 GMT

Motivated by applications to process massive datasets, we study streaming algorithms for pure exploration in Stochastic Multi Armed Bandits (MABs).

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Italy > Lazio > Rome (0.04)
(15 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Communications (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.87)

Add feedback

Making the Cut: A Bandit-based Approach to Tiered Interviewing Candice Schumann

Neural Information Processing SystemsAug-20-2025, 03:58:02 GMT

'... nothing we do is more important than hiring and developing people.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
North America > Canada (0.04)

Genre: Personal > Interview (0.48)

Industry: Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

d5e9cf50dc182447a916bc56d4d42942-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 06:17:38 GMT

arm pull, artificial intelligence, probability, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

d5e9cf50dc182447a916bc56d4d42942-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 06:17:34 GMT

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Italy > Lazio > Rome (0.04)
(18 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Data Science > Data Mining > Big Data (0.32)

Add feedback

Secure Best Arm Identification in the Presence of a Copycat

Cohen, Asaf, Günlü, Onur

arXiv.org Artificial IntelligenceJul-29-2025

Consider the problem of best arm identification with a security constraint. Specifically, assume a setup of stochastic linear bandits with $K$ arms of dimension $d$. In each arm pull, the player receives a reward that is the sum of the dot product of the arm with an unknown parameter vector and independent noise. The player's goal is to identify the best arm after $T$ arm pulls. Moreover, assume a copycat Chloe is observing the arm pulls. The player wishes to keep Chloe ignorant of the best arm. While a minimax--optimal algorithm identifies the best arm with an $Ω\left(\frac{T}{\log(d)}\right)$ error exponent, it easily reveals its best-arm estimate to an outside observer, as the best arms are played more frequently. A naive secure algorithm that plays all arms equally results in an $Ω\left(\frac{T}{d}\right)$ exponent. In this paper, we propose a secure algorithm that plays with \emph{coded arms}. The algorithm does not require any key or cryptographic primitives, yet achieves an $Ω\left(\frac{T}{\log^2(d)}\right)$ exponent while revealing almost no information on the best arm.

artificial intelligence, best arm, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.18975

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.93)

Add feedback

Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap

Karpov, Nikolai, Wang, Chen

arXiv.org Artificial IntelligenceFeb-3-2025

We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap $\Delta_{[2]}$. Here, and throughout, the optimality gap $\Delta_{[i]}$ is defined as the mean reward gap between the best and the $i$-th best arms. A recent line of results by Jin, Huang, Tang, and Xiao [ICML'21] and Assadi and Wang [COLT'24] have shown that if there is no known $\Delta_{[2]}$, a pass complexity of $\Theta(\log(1/\Delta_{[2]}))$ (up to $\log\log(1/\Delta_{[2]})$ terms) is necessary and sufficient to obtain the *worst-case optimal* sample complexity of $O(n/\Delta^{2}_{[2]})$ with a single-arm memory. However, our understanding of multi-pass algorithms with known $\Delta_{[2]}$ is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., $O( \sum_{i=2}^{n}1/\Delta^2_{[i]})$ arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is $\Theta(\log{n})$ passes (up to $\log\log{n}$ terms). We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of $o({n}/{\text{polylog}({n})})$ arms -- and $O(\sum_{i=2}^{n}{1}/{\Delta^{2}_{[i]}}\cdot \log{(n)})$ arm pulls has to make $\Omega(\frac{\log{n}}{\log\log{n}})$ passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of $\Delta_{[2]}$, finds the best arm with $O( \sum_{i=2}^{n}1/\Delta^2_{[i]} \cdot \log{n})$ arm pulls and a *single arm* memory.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.01067

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > New York (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

UCB algorithms for multi-armed bandits: Precise regret and adaptive inference

Han, Qiyang, Khamaru, Koulik, Zhang, Cun-Hui

arXiv.org Machine LearningDec-8-2024

Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax optimality properties, a precise understanding of their regret behavior remains elusive. This gap has not only hindered the evaluation of their actual algorithmic efficiency, but also limited further developments in statistical inference in sequential data collection. This paper bridges these two fundamental aspects--precise regret analysis and adaptive statistical inference--through a deterministic characterization of the number of arm pulls for an UCB index algorithm [Lai87, Agr95, ACBF02]. Our resulting precise regret formula not only accurately captures the actual behavior of the UCB algorithm for finite time horizons and individual problem instances, but also provides significant new insights into the regimes in which the existing theory remains informative. In particular, we show that the classical Lai-Robbins regret formula is exact if and only if the sub-optimality gaps exceed the order $\sigma\sqrt{K\log T/T}$. We also show that its maximal regret deviates from the minimax regret by a logarithmic factor, and therefore settling its strict minimax optimality in the negative. The deterministic characterization of the number of arm pulls for the UCB algorithm also has major implications in adaptive statistical inference. Building on the seminal work of [Lai82], we show that the UCB algorithm satisfies certain stability properties that lead to quantitative central limit theorems in two settings including the empirical means of unknown rewards in the bandit setting. These results have an important practical implication: conventional confidence sets designed for i.i.d. data remain valid even when data are collected sequentially.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2412.06126

Country:

North America > United States > California > Alameda County > Berkeley (0.28)
North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.74)

Add feedback

A General Framework for Clustering and Distribution Matching with Bandit Feedback

Yavas, Recep Can, Huang, Yuqi, Tan, Vincent Y. F., Scarlett, Jonathan

arXiv.org Machine LearningSep-8-2024

We develop a general framework for clustering and distribution matching problems with bandit feedback. We consider a $K$-armed bandit model where some subset of $K$ arms is partitioned into $M$ groups. Within each group, the random variable associated to each arm follows the same distribution on a finite alphabet. At each time step, the decision maker pulls an arm and observes its outcome from the random variable associated to that arm. Subsequent arm pulls depend on the history of arm pulls and their outcomes. The decision maker has no knowledge of the distributions of the arms or the underlying partitions. The task is to devise an online algorithm to learn the underlying partition of arms with the least number of arm pulls on average and with an error probability not exceeding a pre-determined value $\delta$. Several existing problems fall under our general framework, including finding $M$ pairs of arms, odd arm identification, and $M$-ary clustering of $K$ arms belong to our general framework. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $\delta$. Furthermore, we develop a computationally-efficient online algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm, and show that the average number of arm pulls of our algorithm asymptotically matches that of the lower bound. Our refined analysis also uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as $\delta$ vanishes.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

2409.05072

Country: